Impact of Gene Annotation on RNA-seq Data Analysis
نویسندگان
چکیده
RNA-seq has become increasingly popular in transcriptome profiling. One of the major challenges in RNA-seq data analysis is the accurate mapping of junction reads to their genomic origins. To detect splicing sites in short reads, many RNA-seq aligners use reference transcriptome to inform placement of junction reads. However, no systematic evaluation has been performed to assess or quantify the benefits of incorporating reference transcriptome in mapping RNA-seq reads. Meanwhile, there exist multiple human genome annotation databases, including RefGene (RefSeq Gene), Ensembl, and the UCSC annotation database. The impact of the choice of an annotation on estimating gene expression remains insufficiently investigated. In this chapter, we systematically characterized the impact of genome annotation choice on read mapping and gene quantification by analyzing a RNA-seq dataset generated by Illumina’s Human Body Map 2.0 Project. The impact of a gene model on mapping of non-junction reads is different from junction reads. We demonstrated that the choice of a gene model has a dramatic effect on both gene quantification and differential analysis. Our research will help RNA-seq data analysts to make an informed choice of gene model in practical RNA-seq data analysis.
منابع مشابه
I-13: Transcriptome Dynamics of Human and Mouse Preimplantation Embryos Revealed by Single Cell RNA-Sequencing
Background: Mammalian preimplantation development is a complex process involving dramatic changes in the transcriptional architecture. However, it is still unclear about the crucial transcriptional network and key hub genes that regulate the proceeding of preimplantation embryos. Materials and Methods: Through single-cell RNAsequencing (RNA-seq) of both human and mouse preimplantation embryos, ...
متن کاملStrategies to avoid drowning in the deep sequencing data flood
The enormous technological progress in the field of functional genomics during the last 15 years had a significant impact on animal sciences. With the development of Next Generation Sequencing it became feasible to analyze genomes and transcriptomes within short time frames and affordable costs. One major challenge of this rapid development is to manage the data flood and to perform data analys...
متن کاملOptimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes
Next-generation sequencing methods, such as RNA-seq, have permitted the exploration of gene expression in a range of organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacy and accuracy of RNA-seq annotation methods using reference genomes from related species have yet to be robustly characterized. Here we conduct a comprehensive power analysi...
متن کاملRNA-Seq analysis in MeV
SUMMARY RNA-Seq is an exciting methodology that leverages the power of high-throughput sequencing to measure RNA transcript counts at an unprecedented accuracy. However, the data generated from this process are extremely large and biologist-friendly tools with which to analyze it are sorely lacking. MultiExperiment Viewer (MeV) is a Java-based desktop application that allows advanced analysis o...
متن کاملComparison of RNA-Seq and Microarray in Transcriptome Profiling of Activated T Cells
To demonstrate the benefits of RNA-Seq over microarray in transcriptome profiling, both RNA-Seq and microarray analyses were performed on RNA samples from a human T cell activation experiment. In contrast to other reports, our analyses focused on the difference, rather than similarity, between RNA-Seq and microarray technologies in transcriptome profiling. A comparison of data sets derived from...
متن کامل